Skip to content

Bump hyperlight-host to new snapshot api #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Jul 24, 2025

Bumps hyperlight dependency. Reworks uses of MultiuseSandboxContext to use snapshots instead.

Memory is no longer restored after a call to LoadedWasmSandbox::call_guest_function which revealed that memory is leaked during mashalling, and a missing post_call after calling a wasmtime func. In addition, a Store and wasm Instance are no longer created per guest function call. Rather, it's being created once and reused.

Copy link
Contributor

@dblnz dblnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work updating hyperlight-wasm to use the new snapshoting API.
I left a couple of comments/questions. Let me know what you think.

marosset and others added 2 commits July 28, 2025 10:51
…ead of mshv2)

Signed-off-by: Mark Rossett <marosset@microsoft.com>
Update all hyperlight dependencies from various revisions to a unified
172fcfa69b0f9064c7a0e48e512f8a86ae1fdbe1 snapshot that includes
snapshot/restore functionality.

Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Replace transition-based sandbox evolution with direct snapshot/restore API:

- Remove EvolvableSandbox/DevolvableSandbox trait usage
- Replace transition callbacks with direct method calls
- Add snapshot/restore methods to LoadedWasmSandbox
- Store initial snapshots in WasmSandbox for efficient unloading
- Export Snapshot type in public API

This provides more direct control over sandbox state management
and enables features like checkpoint/restore.

Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
}

/// Restore the state of the sandbox to the state captured in the given snapshot.
pub fn restore(&mut self, snapshot: &Snapshot) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have a pub fn new_from_dirty_sandbox_and_snapshot(sandbox_to_be_erased: MultiUseSandbox, loaded_snapshot: Snapshot) (probably with less verbose names) or something like that?

I think that the common use case for e.g. function services is going to be "have a pool of sandboxes in various states, and whenever a request comes in, grab one, restore a sanpshot for the correct customer into it, and continue", which means that you would indeed want a function here which can just take any hyperlight sandbox in any state and restore the snapshot of runtime + image into it.

Copy link
Contributor Author

@ludfjig ludfjig Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would work quite yet since restoring only works for snapshots from the same sandbox right now. But that does sound good in general

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this can still be useful even with the per-sandbox snapshots, since you can still rotate between a bunch of snapshots on the same sandbox and keep them cached aroun.

)?;

// Track any allocations for later cleanup
if !allocated_addrs.is_empty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way that we are deferring the cleanup here makes me feel like we are not being precise about the lifetime and ownership of parameter/return data (in the informal sense, not the Rust sense). I guess that we are saying here that whenever you call into a sandbox, the parameters and the return values from any host call, both live until you return from that call. That seems like a bit of a strange and arbitrary invariant to me, and if we really want to go down that path we should definitely document it.

Why doesn't the wasm inside the component free these values itself, whenever it is done with them? It seems like their lifetimes only affect wasm memory (since there aren't any e.g. resource handles which refer out of the sandbox in the module variant), and so it seems like it ought to be a guest concern to free them? I think that would make all of this more similar to the component model design as well.

Instead of enforcing this lifetime this way, we could perhaps think about finding a way to add something to audit the guest heap usage, or something like that? I'm not super sure what would make sense here, but I don't really like this approach to deciding how long the memory in wasm lives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed, please have another look at marshal.rs top of file comment.

ludfjig added 2 commits August 1, 2025 12:39
- Reuse wasmtime Store and Instance across guest function calls instead
  of creating new one per call.
- Establish memory contract between host and guest.
 - Guest functions takes ownership of input parameters
 - Guest transfer ownership of return values
 - Host functions parameters are borrowed from guest
 - Host function return values are owned by guest and guest must free them.
- Component: Add post_return calls for proper WASM function cleanup
- Fix ABI mismatch in parameter of guest_dispatch_function

Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Comment on lines +224 to +225
let #ret = func.call(&mut *store, (#(#pus,)*))?.0;
func.post_return(&mut *store)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch, according to the docs "embedders must unconditionally call TypedFunc::post_return"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah thanks for catching that!!

//! `free` function exported from the guest module.
//!
//! ## Guest Function Return Values (Guest → Host)
//! - When guest functions return String or VecBytes values, the guest allocates memory in its
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that this means that the guest has to use the malloc corresponding to its exported free for these.

The alternative here would be to do the same post_return thing that we do in component land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants